12 research outputs found
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
Word embeddings have recently seen a strong increase in interest as a result
of strong performance gains on a variety of tasks. However, most of this
research also underlined the importance of benchmark datasets, and the
difficulty of constructing these for a variety of language-specific tasks.
Still, many of the datasets used in these tasks could prove to be fruitful
linguistic resources, allowing for unique observations into language use and
variability. In this paper we demonstrate the performance of multiple types of
embeddings, created with both count and prediction-based architectures on a
variety of corpora, in two language-specific tasks: relation evaluation, and
dialect identification. For the latter, we compare unsupervised methods with a
traditional, hand-crafted dictionary. With this research, we provide the
embeddings themselves, the relation evaluation task benchmark for use in
further research, and demonstrate how the benchmarked embeddings prove a useful
unsupervised linguistic resource, effectively used in a downstream task.Comment: in LREC 201
Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts
In this paper, we report a knowledge-based method for Word Sense
Disambiguation in the domains of biomedical and clinical text. We combine word
representations created on large corpora with a small number of definitions
from the UMLS to create concept representations, which we then compare to
representations of the context of ambiguous terms. Using no relational
information, we obtain comparable performance to previous approaches on the
MSH-WSD dataset, which is a well-known dataset in the biomedical domain.
Additionally, our method is fast and easy to set up and extend to other
domains. Supplementary materials, including source code, can be found at https:
//github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical
Natural Language Processing, Berlin 201
A Short Review of Ethical Challenges in Clinical Natural Language Processing
Clinical NLP has an immense potential in contributing to how clinical
practice will be revolutionized by the advent of large scale processing of
clinical records. However, this potential has remained largely untapped due to
slow progress primarily caused by strict data access policies for researchers.
In this paper, we discuss the concern for privacy and the measures it entails.
We also suggest sources of less sensitive data. Finally, we draw attention to
biases that can compromise the validity of empirical research and lead to
socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17
The produsing expert consumer : co-constructing, resisting and accepting health-related claims on social media in response to an infotainment show about food and nutrition
This article examines the Twitter and Facebook uptake of health messages from an infotainment TV show on food, as broadcasted on Belgium’s Dutch-language public broadcaster. The interest in and amount of health-related media coverage is rising, and this media coverage is an important source of information for laypeople, and impacts their health behaviours and therapy compliance. However, the role of the audience has also changed; consumers of media content increasingly are produsers, and, in the case of health, expert consumers. To explore how current audiences react to health claims, we have conducted a quantitative and qualitative content analysis of Twitter and Facebook reactions to an infotainment show about food and nutrition. We examine (1) to which elements in the show the audience reacts, to gain insight in the traction the nutrition-related content generates and (2) whether audience members are accepting
or resisting the health information in the show. Our findings show that the information on health and production elicit the most reactions, and that health information incites a lot of refutation, low acceptance and a lot of suggestions on new information or new angles to complement the show’s information
Embarrassingly Simple Unsupervised Aspect Extraction
We present a simple but effective method for aspect identification in
sentiment analysis. Our unsupervised method only requires word embeddings and a
POS tagger, and is therefore straightforward to apply to new domains and
languages. We introduce Contrastive Attention (CAt), a novel single-head
attention mechanism based on an RBF kernel, which gives a considerable boost in
performance and makes the model interpretable. Previous work relied on
syntactic features and complex neural models. We show that given the simplicity
of current benchmark datasets for aspect extraction, such complex models are
not needed. The code to reproduce the experiments reported in this paper is
available at https://github.com/clips/catComment: Accepted as ACL 2020 short pape